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Abstract 



> 

I/-) ■ We propose a new approach for estimating the parameters of a probability 

distribution. It consists on combining two new methods of estimation. The first 
\ is based on the definition of a new distance measuring the difference between 

variations of two distributions on a finite number of points from their support and 
on using this measure for estimation purposes by the method of minimum distance. 
For the second method, given an empirical discrete distribution, we build up an 
auxiliary discrete theoretical distribution having the same support of the first and 
depending on the same parameters of the parent distribution of the data from 
which the empirical distribution emanated. We estimate then the parameters 
from the empirical distribution by the usual statistical methods. In practice, we 
propose to compute the two estimations, the second based on maximum likelihood 
principle of known theoretical properties, and the first being as a control of the 
effectiveness of the obtained estimation, and for which we prove the convergence in 
probability, so we have also a criterion on the quality of the information contained 
in the observations. We apply the approach to truncated or grouped and censored 
data situations to give the flavour on the effectiveness of the approach. We give 
also some interesting perspectives of the approach including model selection from 
truncated data, estimation of the initial trial value in the celebrate EM algorithm 
in the case of truncation and merged normal populations, a test of goodness of fit 
based on the new distance, quality of estimations and data. 

Key words and phrases: EM algorithm, Minimum distance, Model selection from truncated 
data, Point estimation, Truncated data, Grouped and censored data. 
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1 Introduction 



Point estimation is the most popular forms of statistical inference (see Lehmann and 
Casella [10J). We introduce in this paper a new statistical point estimation approach 
which found be useful in special practical situations such as truncated and grouped and 
censored data. The data are said to be truncated when measuring devices fail to re- 
port observations below and/or above certain readings. For example, truncated data 
frequently arise in the statistical analysis of astronomical observations ( see Efron and 
Petrosian [B]) and in medical data (see Klein and Zhang [H]), and if the truncation is 
ignored this can cause considerable bias in the estimation. There exists in the literature 
many approaches of estimation from "incomplete data" such as maximum likelihood 
based approach of the EM algorithm (Hartley [7], Dempster et al [5]), or nonparamet- 
ric methods such as Kaplan-Meier (Kaplan and Meier [5]) or Lynden-Bell estimators 
(Lynden-Bell [H]). The purpose of the present paper is to investigate another approach 
which consists on combining two new methods of estimation and to apply it in the fixed 
type I censored or grouped and censored data situations. 

In the first method, we remark that in estimation problems we deal in general with 
three functions: a theoretical probability law f(-,9) of a random variable X, depending 
on a parameter 9 (real or vector valued), an empirical distribution / constructed from a 
sample of observations drawn from the random variable X, and an estimation / (from 
an estimation 9 of 9) obtained through the empirical law /. The empirical distribution 
/ is considered as a representative distribution of /, but in practice it is reduced to only 
few of its characteristics such as the mean and variance. The variational aspect of / is 
often neglected while its importance. We can easily find, for instance, two distributions 
having the same support, mean and variance while their variations differ significantly, or 
conversely having the same variations but their supports and characteristic parameters 
are different. But two probability distributions with same support and same variations 
in each subset of the support are necessarily the same. We introduce then a new distance 
which measures the difference between variations of two distributions on a finite number 
of points and to use it for estimation purposes by the method of minimum distance. 
Since the new measure is not equivalent to classical ones it will give new insights that 
could not be investigated by classical distances. 

In the second method, we remark that the empirical distribution arising from a sam- 
ple of observations can be viewed in fact as a conditional distribution as it is built from 
the knowledge of the data. It will be then an estimation of the theoretical conditional 
distribution with respect to the observations before being an estimation for the parent 
distribution. This theoretical conditional distribution is represented by the auxiliary 
distribution introduced in this paper. To determine this distribution in discrete case, 
we have simply to take the conditional distribution with respect to the observed values 
and we proceed analogously for the continuous case. It should be noted that in discrete 
case it is known as the truncated distribution which is the conditional distribution given 
a truncation (see for example Shaw [13]) but it is presented here in a general frame- 
work. We have to deal with two discrete probability distributions having the same finite 
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support, a theoretical distribution and its empirical representation with respect to the 
observations. The parameters of the former are those of the parent distribution and 
the aim is to estimate them from the first instead of the parent one as commonly used. 
We use classical tools such as the method of moments or maximum likelihood principle. 
The setting that seems to us most suitable for illustrating our approach is the one of 
truncated or grouped and censored data. In usual practical problems, truncation can be 
on left or right or in either situations, and the "cut off" can be deterministic or random. 
In our approach, the truncation may be on any part of the range of the distribution 
so that the setting is more general. Also, classical approaches for truncated data are 
in general custom-made depending on specific problems and distributions, or subjective 
based methods. Instead, our approach is quite general and might be used in any situa- 
tion where the underlying complete data come from a known family of distributions. We 
confine ourselves as a first presentation to fixed type I and grouped and censored data. 

In the subsequent section, we propose a variational distance between probability dis- 
tributions. In Section 3, we define a truncation of data and associated empirical and 
theoretical distributions and we use two different methods for estimation from trunca- 
tion, a first method using minimum of the new distance introduced in this paper and a 
second method based on traditional tools of estimations such as the method of maximum 
likelihood. In Section 4, we present the new approach and we illustrate the procedure 
by three examples: a binomial probability law, a normal distribution and a Gamma 
density function. We present also a basic feature of the new approach which prove the 
accuracy of the method and some illustrative examples. In Section 5, we give some ele- 
ments of comparison with the classical approach of estimation. In Section 6, we list some 
perspectives of the new approach: model selection from truncated data using the new 
distance, estimation of the first trial value in the celebrate EM algorithm for incomplete 
data in the case of truncation and merged normal distributions, a goodness of fit test 
based on the new distance, decision making about the quality of estimations and data. 
Finally, concluding remarks are made some pointing to other possible extensions and 
applications. 

2 A New Distance Between Probability Distribu- 
tions 

As is usual, given a sample of n independent and identically distributed observations, 
) , drawn from an unknown discrete random variable X falling; in a discrete 
family of probability laws V = {f(-,9),9 G W} depending on a parameter 9 (real or 
vector valued), i.e., f(x,9) = P(X = x), one can summarize the sample into k couples 
(yi, /i), (yk, fk), k < n, where the yi are the different values taken by the sample and 
/ is the empirical law fj = rij/n, where rij represents the absolute frequency of the value 
Vj , j = l,...,k. 

Usually, it is hoped that f) ~ f(yj,9), in a certain probabilistic sense. But if the 
empirical distribution arises from truncated data, we do not hope in general having 
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f(x) ~ f(x, 9), for the values x in the support of /, since the complete sample size n is 
usually not reported. However, we expect reasonably to have approximately 



/(*) ^ f(x,9) 



(1) 



f(y) /W)' 



for any points in its support, only if the sample has serious irregularities. 

Introduce the following distance of proportional variations between f{-,9) and f 



It turns out that this new distance, as we will show, measures the variations between 
probability distributions. 

In continuous case also, any sample X\, x n is summarized into k couples (yx, fx), ... 
,{lJk,fk), k < n. This can be done uniquely, by grouping for example the sample in 
classes where the y^ are the mid-classes (or class means) and /j = f{yj) where / is an 
empirical density estimator, or the data is presented in a grouped and censored form. 
The proportional variational distance d v in this case, between the density f(x, 9) of X 
and its empirical law /, is thus defined as . One of its main powerful feature is that 
when using traditional distances we have to use the sample size n through the expression 
of fi = 7ii/(nh n ), where h n is the size of class intervals; but sometimes, as for truncated 
data situations where measuring devices fail to report even the number of sample points 
in certain ranges, then the real size n is not known, but a truncated sample size n t is 
instead used. Using the ratios fi/ fj will clear up the effect of the truncated sample size 
which can lead to considerable bias in the estimation. 

Note that d v possesses the properties of symmetry and triangle inequality. But in 
the identity property d v (f,g)(x,y) = -<=>- / = g, the equality between / and g must 
be understood in the sense that / and g have the same variations on the points x and 
y. It should be stressed that this new measure is not equivalent to classical ones and 
should then give new insights and information about other characteristics and features 
of probability distributions. 

From now on / shall represent a theoretical probability law in both discrete or contin- 
uous cases and / shall represent the corresponding empirical law in both cases. Denote 
by Q = {x e R, f(x, 9) > 0} the set of atoms of / or support. Let T be the a— algebra 
generated by sets A = B H uj where the u are the Borel sets of R and B G Vt. For all 
A G J 7 , we have P (A) = J A f(x, 9)fi (dx), where \i is the Lebesgue measure on R. In 
discrete case, we have P (A) = J2 x &a f( x i 

For all i > 1, we set f2j = fi, Ti = T and Pi = P. Let fi n = fij x ... x Q n , 
jrW = j: x ... j= n an d p(«) = p x ® ... g> P n . The probability space (O n , J* n \ ?W) 
represents the space of samples of size n from the random variable X. We omit the 
subscript n in (O n ,JF("),p( n )) for notational convenience and shall denote the sample 
space as (fl, J 7 , P) . 
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2.1 A Notion of Variation between probability distributions 



We will discuss now the measure theoretic aspect of the new distance introduced above. 
Let P and Q two probability measures defined on the same measurable space (f2,jF), 
/ and g their respective probability densities, not necessarily with respect to the same 
measure and E an event of this space. We say that / and g have the same variation on 
E, if the respective restrictions of / and g on E, define the same probability measure on 
E endowed with the sigma algebra traces of T on E. 

Definition 1 Let f and g two probability distributions positive and defined on a part E 
not reduced to only one element. If in any point (x, y) of E x E, we have: 

f( x ) gW ( o\ 

f(y) g(y) 1 ' 

then we say that f and g have same variations on E. 

Example 2 Let f be a density of a probability measure P and E an event such that 
P{E) > 0. The restriction of f on E and the conditional distribution of f with respect 
to E define the same probability measure on E and consequently they have the same 
variations on E. 



Definition 3 Let f and g two probability distributions and E an event on which they 
are strictly positive. If E is discrete and not reduced to only one element, and one of the 
distributions f and g being discrete and the other may not be discrete, we call distance 
in variations between f and g on E the quantity: 



d v (f,g) 



E 



= £ 

(x,y)€E 



g{x) 



f(y) g(y) 



If E is an interval of I 
Lebesgue measure fi on 
quantity: 



and, f and g are probability densities on M,, with respect to 
1, we call distance in variations between f and g on E, the 



d v (f,g)E = JJ 



ExE 



g{x) 



f(y) g(y) 



fi(dx)fi(dy). 



Let be given a classical distance d between two functions / and g which associates 
for points x and y from the intersection of their domain of definitions, the quantity 
d(f,g)(x,y) = \f(x)-g(x)\ + \f(y)-g(y)\. 

Proposition 4 We have the following properties for the distance d v : 

1. d(f,g)(x,y) = d v (f,g)(x,y) = 0, the converse is not always true. 

2. Let f be a kernel density estimation. Then lim^oo d v (f, f) = in probability. 

3. Let f and g be two functions defined on M and E cl satisfying: 



V(x,y)eExE, d v (f,g)(x,y) = 0. 
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/ / dp = / g dfi = 1, 
where \x is the Lebesgue measure on R, then 

fi (E) = f = g fi — almost surely on R. 
Proof. 1. Follows directly from the definitions of d and d v . 

2. Follows from the fact lim^oo d(f , f) = in probability (see Parzen [T2]), then 
lim n ^ QO d v (f, f) = in the same probabilistic notion of convergence. 

3. Fix y € -E, we have f(x)/f(y ) = g(x)/g(y ) for all x & E. This implies that 

/(a;)dx = l f f(y )^P-dx = ^\ [ g(x)dx = 1. 

We deduce that /(?/o) — g{Vo)i an d the result follows. ■ 



3 Truncated Data 

The truncated data specification, or generally incomplete data, implies the existence 
of two sample spaces X Q and Xf, such that the complete sample space is given by Q = 
X a \jXf The observed data x D = (xi, x nt ) , where n t is the truncated sample size, are a 
realization from X Q and the unobserved data z = (x*, x*_ n J , where n is the complete 
unknown sample size, are from X t . The complete data x = x c U z is known only through 
the observed data x D (see Dempster, Laird and Rubin [5] for further explanations about 
incomplete data specification). 

Consider a sample of observations x\, x n drawn from a theoretical probability law 
f(-,8), depending on a parameter 9 e R r . As usual, the data are summarized, in discrete 
or continuous cases (as shown in Section 2), into k couples (yi, fi), (yk, fk)i k < n, 
and let A = {u\, ...,u m } a part from the set {yi, yu} , m < k, which we will call 
truncation. The observed data is summarized by a truncation A D = {ui, ...,u m } and an 
empirical estimation f a and assume that the unobserved data is also summarized by a 
set A t = {u{, ...,u*} and %. 

The structure of the new distance d v allows the following decomposition property: 

d v (l /(■, 9)) = d v (f a , /(-, 9)) + d v (f u /(-, 9))+ (4) 

ft{u*) f(u*,9) 

f ( Ui ) f(ui,9) ■ 

M*eA( w*eA t 

The following proposition is typical for the new distance and is useful for using the 
minimum of distance d v . 

Proposition 5 Let be given a truncated data A G with corresponding empirical estima- 
tion f Q . Then liuin^^ d v (f Q , f) = in probability. 



E 



fo(ui) f(ui,9) 



E 
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Proof. We have from Proposition 1 that lim ra ^ 00 d v (f, f) = in probability. Then, from 
the decomposition property fll]) we obtain lim^^ d v (f , f) = lim^,-^ d v (f , f) = in 
probability ■ 

3.1 An Auxiliary Distribution 

Define the empirical distribution / corresponding to a given truncation A by: 

f( x ) = l h if x = u ^ i = l,-,m, 
\ otherwise, 

where the ft satisfy the following set of proportional allocation equations f%/ fj = fi/fj, 
for i,j = 1, ...,m and fi + ... + f m = 1. 

Define the following auxiliary distribution from /(-,#), which is akin to the propor- 
tional allocation procedure for missing values (see Hartley [7]). 

r am) _ 

MM)= /(t tll e) + /K0) + ... + /(tt mi e) 11 x Ml ' 1 i '-' m ' (5) 

^ otherwise 

Remark 6 // i/ie truncation is random, that is, there exists a random variable T such 
that we observe, for example, the random variable X only if X > T or X < T, then 
the probability law used in is replaced by the conditional law of X with respect to 
{X > T} or {X < T} respectively. 

The auxiliary distribution h was found be useful for estimation problems in truncated 
data. Indeed, it is well known in classical estimation from truncated data (see Hartley 
[7J) that missing values could be recovered by "proportional allocation" procedures, then 
the auxiliary distribution h, which is already based on proportional allocation, will be 
an intuitive and natural tool for estimation purposes from truncated data. The function 
h is a theoretical probability distribution depending on the same parameters of those of 
/. It has also the same support as that of /. 

Definition 7 We call f and h(-, 6) the empirical and theoretical distributions of a given 
truncation A = {u\, u m } from a sample of observations (xi, x n ) . 



4 The Approach of Estimation 

We will use mainly two methods of estimation. The first method is a minimum distance 
estimation using the metric d v between the empirical and theoretical distributions / and 
f (•,$). The second is similar to traditional ones such as the method of substitution or 
maximum likelihood principle, by considering / as an empirical estimation of h(-,9). 
The first is based on variational difference between distributions and the second in the 
sense of an euclidean difference and hence they treat different aspects of the sample of 
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observations. If for a given data they give different estimations, we cannot suspect the 
approaches but we can say that the data do not restore in a coherent way all aspects 
of the probability distribution from which it emanated. If on the other hand they give 
significantly the same estimations we can assert that the estimation is credible since 
through different aspects it has given the same distribution. That is the distribution 
which fits the best the empirical distribution. Practically, we propose to calculate the 
estimations by the two methods and take the second one since based on maximum 
likelihood principle of good known theoretical properties. We use then the first as a 
tool of decision on whether the estimation is credible or not. The estimation will then 
be considered as credible in cases where the two methods give approximately the same 
estimation. 

4.1 Convergence in Probability of the Minimum Distance Es- 
timator 

Let X U X 2 , X n a sample with X* ~ f(x, 6), 6 = (9 U S )* 60C R s , with 

f(x,9) = K(x) x exp \^y k T k {x) + A(9)\ , (6) 

x G X C R, where X is a Borel set of R such that X ={x : f(x, 9) > 0} for all 9 e G. 

The family ([6]) is very rich, one finds there, for example, the family of the normal 
laws, and the family of the laws of Poisson. We assume that the support X does not 
depend on 9. Denote by 9 n the estimator by the minimum of metric d v between the 
empirical and theoretical distributions /„ (based on a sample of size n) and f(-,9), that 
is 

9 n = &Ygmmd v (f(-,9),f n ). 

9 

This estimator falls into the class of M-estimators. Using well known theorems on the 
convergence of M-estimators (see for example Amemiya pQ) we will prove that 9 n con- 
verges in probability to the true parameter. 

Proposition 8 Let Xi, X 2 , X n be a sample from the family of distributions (0|). // 
the set of natural parameters G is covex and the true parameter 9 is an interior point of 
G, then the estimator 9 n by the minimum of the distance of variations d v converges in 
probability to the true parameter 9, i.e., 

— p 
9 n — > 9. 

Proof. Since we search for a minimum of the criterion function d v , it suffices to show, 
under the assumptions of the family ([6]) and the convexity of the set G, that d v (9,x) 
seen as a function of 9 is a convex function (see Amemiya [1]). Hence, this reduces the 
problem to the convexity of 



8v(0) 



f(Vi,0) f( V i 
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For A, jj, e R with A + ji = 1, and 0W, 6» (2) G 9, we have 



6y(A0 (1) +//0 (2) ) 



Cy exp I g [A0« + ^f] (T fc ( yi ) - T fe ( % )) I - A, 



where Cy = K{yi) / K{yj) and assume that Cy > and Ay = f(yi)/f(yj)- 
we have from the convexity of the exponential function that 

cxpj^ [A^+^f] (T k ( yi ) - T k ( yj ))^ < Aexpjg^ (T fc ( yi ) - T fe (y,)) 

+ /xexp|^^ 2) (T fc ( W ) - T k { Vj )) 



. fc=i 



. fc=i 



then 



Cy exp |g [A^ 1} + /itff] (T fc ( W ) - r fc ( % ))| - Aij < 
ACy exp <j £ 0« (T fc ( W ) - TUy,-)) 1 + //Cy exp J £ 0< 2 > - T k ( Vj )) 



. fc=i 



. fe=i 



- (A + fi) < A 



Cy exp J £ (T fc ( W ) - T*( % -)) " ^ 



fe=i 



+ 



/i 



Cy exp J £ (TfcG/0 - T fc ( % -)) - A 



. fc=i 



Introducing the absolute value we get 



<J ij (A0«+^ (2) ) 



(7) 



Cy exp | £ [A^ 1} + /^f] (T fc ( W ) - T fc ( % -)) | - (A + v) A 3 



. k=i 



< A 



Cy exp <{ £ 0« (T fc ( W ) - T fe ( % )) |> - Ay 



. fc=l 



Cy exp I ^ - T ^ \ - A * 



. k=l 



Hence Sij(9) is a convex function of 9, which implies the convexity of d v (9,x) seen as 
a function of 9 and then the convergence in probability of the minimum of distance d v 
estimator. ■ 
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4.2 A Maximum Likelihood Principle with the Auxiliary Dis- 
tribution 

We firstly begin in a general situation, that of the one-parameter exponential family, to 
show how to use the procedure explained below in the case of the new method. Consider 
the one-parameter exponential family with density 

f(x, 9) = K(x) x exp[9T(x) - A(9)\, (8) 

where 9 is the parameter, T a statistic, K(x) a function of x and A is a function of 
the parameter 9. Let us use the maximum likelihood principle. Consider a sample 
of observations x±, ...,x n from which we derive the support A = {yi, ...,yk} • We then 
construct the auxiliary distribution from the support A, expressed in the following form 

K(x) x exp[9T(x) - A{9)\ 

J2l 1 K(y i )xeM0T(y i )-A(9)\ 

We have to maximize the likelihood function given in our case by 

k 

L h (y,9) = Hh( yi ,9). (10) 
i=i 

Without loss of generality, we assume that the class intervals are the same. Then, we 
have 



h(x, &) = ^ p[ :' /M , (9) 



log L h (y, 9) = J2 log Ky h 9) = J^- log 



K(y^xexp[9T(y, i )-A(9)] 

k 

J2K(y i )xexp[9T(y t )-A(9)] 



taking the derivative and solving the score equation on 9 we obtain an estimator of the 
parameter 9 satisfying the relation 



E 



11; 



n 



k 

E 

i=i 



T( yi )xf( yi ,9) 



Y,f(vi,e) 



(12) 



The later result may be obtained directly by the method of moments, but we have 
presented the maximum likelihood method since it is widely used in statistical inference. 

In order to test the performance of the proposed approach, we use synthetic data sets 
which were generated by simulation from three examples of probability law: binomial 
law, normal density and a Gamma distribution. The examples were selected from various 
simulation studies from different family of probability distributions and the two methods 
have shown their effectiveness and never deviate significantly from the true parameter. 
The reason for using synthetic data sets is that the true parameters for the synthetic 
datasets are known and the accuracy of results obtained by using the two new methods 
can be compared. 
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4.3 Examples 



Binomial distribution. We generated a synthetic data set of size 500 from a binomial 
law B(n,p) with n — 10 and p = 0.3, and denote by f(y;p) = C y n p y (\ — p) n ~y its 
probability mass function. The data are summarized in the following table. 



Table 1. 



Vi o 


1 


2 


3 


4 


5 


6 


7 


rii 15 


71 


108 


134 


97 


47 


23 


5 



Our aim is to estimate the parameter p, with the knowledge of n — 10, from different 
truncation of data. 

For illustrating the two methods, consider the truncation A = {2, 3, 4, 5} with trun- 
cated sample size n t = 386. We have then a truncation proportion of Q = 100(n—n t )/n = 
22, 8 % in data. For the first method, we have to search the value of the parameter p 
which minimizes the distance d v , that is: 



mind„(/,/) = min V] 
v v \ 



f(yf,p) rii 



1 

Using computer algebra package, we obtain the result pi = 0.299. 

For the second method, the empirical distribution / given the truncation A = 
{2,3,4,5} is given by f{2) = 108/386, /(3) = 134/386, /(4) = 97/386, /(5) = 47/386 
and f(x) = if x A. 

The auxiliary distribution h(-,p) is given by: 

f (x , p^j 

Mx,p) = {/(2,P) + /(3,P) + /(4,P) + /(5, P ) ' liX = U - * G { 2 ' 3 ' 4 ' 5 > (13) 
I otherwise. 

By the method of substitution, the estimation of p is obtained by solving the equation: 

UiXh(ui,p)= Y UiXf(ui) (14) 

Ui6{2,3,4,5} Mie{2,3,4,5} 

Using a computer algebra package we obtain the result p<i = 0.3. 

In the following table we present the estimations p\ from the first method using 
minimum distance approach using the distance d v , and p 2 from the auxiliary distribution, 
of the parameter p, for known n, according to the truncation A = {ui, u m } considered. 
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Table 2. The estimations p\ and p 2 by the new approach of the parameter p 
of the binomial probability law B(n,p) with p = 0.3 and known n = 10. 



11° 


A 


Truncated 
sample size nt 


Proportion of 
truncation Q (%) 


Pi 


P2 


1 


{0,1,2,3,4,5,6,7} 


500 





0.305 


0.298 


2 


{0,1,2,3,4,5} 


472 


5.6 


0.295 


0.293 


3 


{1,2,3,4,5} 


457 


8.6 


0.288 


0.292 


4 


{0,1,2,3,4} 


425 


15 


0.295 


0.293 


5 


{1,2,3,4} 


410 


18 


0.287 


0.292 


6 


{0,2,3,4,5} 


401 


19.8 


0.295 


0.298 


7 


{2,3,4,5} 


386 


22.8 


0.299 


0.3 


8 


{0,1,3,4,5} 


364 


27.2 


0.295 


0.289 


9 


{0,2,3,4} 


354 


29.2 


0.295 


0.301 


10 


{1,3,4,5} 


349 


30.2 


0.287 


0.287 


11 


{2,3,4} 


339 


32.2 


0.305 


0.305 


12 


{0,3,4,5} 


293 


41.4 


0.295 


0.293 


13 


{2,4,5,6,7} 


280 


44 


0.308 


0.307 


14 


{0,1,2,5,6,7} 


269 


46.2 


0.298 


0.299 


15 


{0,1,4,5,6,7} 


258 


48.4 


0.3013 


0.295 


16 


{0,4,5,6,7} 


187 


62.6 


0.3071 


0.302 


17 


{0,5,6,7} 


90 


82 


0.3014 


0.301 


18 


{0,5} 


62 


87.6 


0.2937 


0.294 



As previously said, the two estimations by the new approach, pi and p 2 , are accurate 
in all cases and close to each other. Furthermore, the truncation proportion has no 
effect on the quality of estimations. The two estimations are also not sensitive to small 
cell probabilities as for truncations including the value 7/g = 7. It should be noted that 
the classical estimation by maximum likelihood without truncation is p = 0.297, and 
considering our approach we obtained the estimations p\ = 0.3053 for the first method 
and p 2 = 0.2978 for the second. 

Normal distribution. Consider a sample of size 400 drawn from a normal population 
with mean m = and standard deviation a — 1. Consider the data falling in 11 fixed 
class intervals as shown in the following table, with mid-classes itj and absolute frequen- 
cies rii 



Table 3. 



Vi -2.581 


-2.06 -1.533 


-1.009 


-0.485 


0.039 


0.563 


1.086 


1.610 


2.134 


2.658 


m 5 


8 23 


48 


71 


89 


72 


43 


25 


10 


6 



The number of bins can be selected from an optimal procedure developed by Birge 
and Rozenholc [2]. Let the following table where we estimate simultaneously m and a 
by the minimum distance procedure with dv. We denote the estimations by mj and a%. 
In each line of the table the estimates are made starting from the table of frequencies 
based on the observations indicated in the first column. The truncated sample size is 
denoted by n t . We have then a truncation proportion of Q = 100(n — n t )/n in data. 
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Table 4. 









vn,\ 


5"i 


{Ul, 2/2, 2/3, 2/4, 2/5, 2/6, 2/7, 2/8, 2/9, 2/10, 2/ll} 


4UU 


U 


U.Uoo 


1 1 Qn 
l.loU 


{2/1, 2/2, 2/3, 2/4, 2/5, 2/6, 2/7, 2/8, 2/9} 


oo4 


4 


U.UUo 


1 noo 


{2/2, 2/3, 2/4, 2/5, 2/6, 2/7, 2/8, 2/9} 


o7n 



o.zo 


U.UM 


u.y / f 


{2/3,2/4,2/5,2/6,2/7,2/8,2/9} 


O 1 1 


7 op; 
/.ZD 


U.Uoz 


U.yyo 


12/4, 2/5, 2/6, 2/7, 2/8, 2/9/ 


0^0 


10 




1 m 7 


{2/5,2/6,2/7,2/8,2/9} 


300 


25 


0.052 


1.012 


{2/3,2/4,2/5,2/6} 


231 


42.25 


0.303 


1.104 


{2/6,2/7,2/8,2/9} 


229 


42.75 


-0.225 


1.140 


{2/6,2/7,2/8} 


204 


49 


-0.065 


1.052 


{2/3,2/5,2/7} 


166 


58.5 


0.052 


0.993 


{2/2,2/3,2/4,2/5} 


150 


62.5 


-0.137 


0.904 


{2/3,2/4,2/5} 


142 


64.5 


-0.151 


0.893 



Remark 9 In practice, the bins are in fact chosen after obtaining the truncated sam- 
ple so the results should be more efficient, but this does not affect the preceding results 
obtained after grouping the whole sample and truncate from the bins since the aim is to 
give some feel about the accuracy of the estimations. Also we can avoid grouping the ob- 
servations by considering empirical frequencies obtained from kernel density estimations. 

4.3.1 Gamma probability density 

Consider a sample of size 800 drawn from a Gamma distribution G(a, b) with density 
given by 

/(x|a,6) = ^- T ^- 1 exp(-^), *>0, (15) 

and parameters a = 7 and 6 = 3. Consider the data falling in 16 fixed class intervals as 
shown in the following table, with mid-classes Ui and absolute frequencies nj : 
Table 5. 



Ui 5.89 


8.72 


11.56 


14.39 


17.23 


20.06 


22.89 


25.73 


28.56 


31.39 


Hi 11 


40 


60 


108 


118 


104 


100 


74 


63 


53 



34.23 


37.06 


39.89 


42.73 


45.56 


48.39 


27 


21 


11 


5 


3 


2 



In the following table we show the estimations bi from the minimum of distance d v 
and 62 by the second method for the parameter b, with known a = 10, according to the 
truncation A considered. 
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Table 6. The estimations b\ and b 2 by the new approach of the parameter b 
of the Gamma probability distribution G(a, b) with b = 3 and known a = 7. 



n° 


A 


n t 


Q{%) 


61 


b 2 


1 


{■Ui, U 2 , U 3 , «4, «5, «6, «7, M 8 , U 9 


800 





3.018 


3.054 




U10, U U , U U , U 13 , U 14 , U 15 , U w } 










2 


{U 2 , U 3 , U A , U 5 , Uq, U 7 , U 8 , Ug, 


787 


1.625 


2.980 


3.065 




UlQ,U u ,U 12 ,U 13 ,U 14 ,U 15 } 










3 


{Ui, U 2 , U 3 , U 4 , U 5 , Uq, U 7 , U 8 , Ug, U 10 , U U , U U } 


779 


2.625 


3.012 


3.068 


4 


{Ui, U 2 , U 3 , Ui, U 5 , Uq, U 7 , U 8 , Uq, U W } 


731 


8.625 


2.895 


3.059 


5 


{U 2 , U 3 , U 4 , U 5 , Uq, U 7 , U 8 , Uq, U W } 


720 


10 


3.063 


3.075 


6 


{m 3 , U 4 , U 5 , Uq, U 7 , Us, Uq, U W } 


680 


15 


3.157 


3.119 


7 


{Ui, U 2 , U 3 , U 4 , U 5 , Uq, U 7 , U 8 , Uq} 


678 


15.25 


2.864 


3.002 


8 


{U 2 , U 3 , U 4 , U 5 , Uq, U 7 , U 8 , Uq} 


667 


16.625 


2.978 


3.018 


9 


{u 3 ,U 4 ,U 5 ,Uq,U 7 ,U 8 ,Uq} 


627 


21.625 


3.086 


3.062 


10 


{Ui, U 2 , U 3 , U 4 , U 5 , Uq, U 7 , U 8 } 


615 


23.125 


2.859 


2.960 


11 


{u 2 ,U 3 ,U 4 ,U 5 ,Uq,U 7 ,U 8 } 


604 


24.5 


2.908 


2.977 


12 


{u 4 ,U 5 ,Uq,U 7 ,U 8 ,Uq} 


567 


29.125 


3.046 


3.016 


13 


{u 2 ,U 3 ,U 4 ,U 5 ,Uq,U 7 } 


530 


33.75 


2.908 


2.978 


14 


{U 2 , U 3 , U 4 , U 5 , U W , Mil, Ul2, Ul3, U U } 


443 


44.625 


3.018 


3.080 


15 


{ui,U 2 ,U 3 ,U 4 ,U 5 ,Uq} 


441 


44.875 


2.775 


2.894 


16 


{Ul, U 2 , U 3 , U 4 , U 8 , Uq, U W , U U , U 15 } 


439 


45.125 


2.969 


3.048 


17 


{Ul, U 2 , U 3 , U 4 , U 5 , Mn, U 12 , M13, M M , M15, Mi 6 } 


406 


50.75 


3.018 


3.031 


18 


{mi,M 2 ,M3,M4,M 5 } 


337 


57.875 


2.788 


2.931 


19 


{m 8 , Ug, M10, Mn, Ul2, Ul3, U 14 , U 15 , U W } 


256 


67.625 


2.990 


3.212 


20 


{miq, Mil, U 12 , M13, M M , U 15 , U le } 


122 


84.75 


2.894 


2.822 



The estimations from the two methods are also accurate in this case of gamma dis- 
tribution for the parameter b. Here also the truncation proportion does not affect the 
quality of estimations. When we consider the complete data, the classical estimation is 
b = 3.04 and the two new estimations are b\ = 3.018 and b 2 = 3.054. 

As it was noticed in the examples above, the two methods lead to approximately the 
same estimation results. Nevertheless, if the two estimations are significantly different, 
it seems related to the quality of the selected data. An important feature of this new 
approach is that the quality of estimations is uninfluenced by the truncation proportion. 
The following section will give further insights of the new approach. 

4.4 A Basic Feature of the New Approach 

The preceding results have shown the effectiveness of the new approach and worked well 
in simulation experiments. Furthermore, the proposition below will give an insight of a 
major feature of the new approach by considering the one parameter exponential family. 
We will prove that for all truncation considered formed by more than two points, from a 
sample of observations; if the ratios of the relative frequencies of the Ui are equal to the 
theoretical ones, then we may obtain the true value of the parameter. We may conjecture 
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that when considering an arbitrary law of probability depending on r parameters, such 
that we have a truncation composed by r + 1 points having exact empirical ratios of the 
relative frequencies then we obtain the true values of the r parameters. 

Proposition 10 Consider a probability distribution f from the one-parameter exponen- 
tial family with density 



where 9 G R is the parameter, T a statistic, K(x) a function of x and A is a function 
of the parameter 9. Assume that we wish to estimate the parameter 9. If we consider 
a truncation having two points x and y with empirical frequencies f\ and fi satisfying 
/1//2 — f{ x i^)/f{y : 9), then, using the approach considered here, we obtain the true 
value of 9. 

Proof. 1. If we consider the minimum of distance d v the result is immediate. 
2. Consider now the second method to estimate m. Consider two values x and y from 
the exponential family with density given by ffT6]) . with 9 being the estimation by the 
new approach, and assume that their empirical frequencies fi and fi are such that 



f(x,9) = K{x) x exp[#T(x) - A{6)}, 



(16) 



h _ f(x,ej 
h f(y,o)' 



We obtain 




Then, we solve on 9 the following equation: 





xK{x) exp ( 9T(x) J + yK(y) exp ( 9T(y) 



) 



K(y)exp(9T(y)) = 0, 




after straightforward algebra we obtain 




) 







yielding the true value 9 = 9. The proof is complete. ■ 



15 



Remark 11 Note that the frequencies f\ and f 2 need not be exact, that is fx may be 
different from f(x,9) and also f 2 , but we require only that their ratio is equal to the 
theoretical one f(x,9)/f(y,9). 



Examples 

Binomial distribution. Consider again the binomial distribution B(n,p) with n — 10 
and p = 0.3 and assume n is known and we wish to estimate p. Assume we have the 
following truncation with only two points A = {0,1}. The exact ratio of their probability 
distribution is given by /(0,p)//(l,p) = 7/30, which is a rational value that will simplify 
the example. Choose the absolute frequencies of the two values considered as being 
n\ — 1 and n 2 = 30 for the values u\ — and u 2 — 1 respectively, in order for having 
/1//2 = f{x, p) / f(y, p) = 7/30. Using the first approach, that of the minimum of distance 



d v , we have to solve 



min d v (f,f) 
p 



mm 

p 



p) 



10 



ci p(i- P y 



30 



+ 



10 



30 

y 



and we get the true value p\ = 0.3. 

Using the second method we have to solve the following equation on p 

x C%(1 -p) 10 + 1 x Cl p(l-p) 9 = 30 
C° w (l ~ P) w + Cl p(l - pf "37' 

and we obtain also the exact result p 2 = 0.3. 

Gamma distribution. Consider the Gamma probability distribution G(a, b) with 
a = 10 and 6 = 5. Assume that a is known and we wish to estimate b. Consider the 
truncation A = {-U3,-u 8 } with w 3 = 30.13 and u$ = 60.02. We have the following 
value of the ratio / b) / f (« 8 , b) ~ 0.799 (the result is an approximate result since for 
probability density functions it is difficult to get an exact rational value but we will show 
that the estimations are very close to the true value). Consider the absolute frequencies 
n 3 = 79.93 (or 80) and n 8 = 100 for the values u 3 = 30.13 and u 8 = 60.02 respectively. 
We have then ns/n 8 ~ / (7x3, b) / f (w 8 , b) . Using the minimum of distance d v , we have to 
solve 

mmd v (f,f) = min [1(79.93/100) - ((30.13/60.02) 9 x exp(-(l/6) x (30.13- 60.02)))! 

b b 

+ (100/79.93) - ((60.02/30.13) 9 x exp(-(l/6) x (60.02 - 30.13)))] , 

and we get the result 61 w 5. 

From the second method, we compute u = 46.7438 and solve on b the following 
equation 

(30.13 - 46.7438) x 30.13 9 x exp(-30.13/6) 
+ (60.02 - 46.7438) x 60.02 9 x exp(-60.02/6) = 0. 
The result IS O9 ~ o. 
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Now assume that the parameters a and b are unknown and show how to jointly 
estimate them using the new approach. Since now there are two unknown parameters, 
we need to have three points from the support, so consider ui = 34.7702, u 2 = 57.5008 
and m 3 = 74.5487 with their corresponding absolute frequencies n\ = 102, n 2 = 100 and 
n,3 = 34. We have to find a and b which minimize the distance d v that is min a fe d v (f, /). 
The result is a « 10.0454 and b « 4.9739. 

5 Elements of Comparison with the Classical Ap- 
proach 

Our aim here is not to give a detailed comparison study which needs to be investigated 
thoroughly, but only some elements of appreciation. A major feature which characterizes 
this new approach from the others is that when we have exact ratios of frequencies we 
obtain the true parameter and when their difference from the theoretical ratios decrease 
the quality of estimation increase even if we are using only a part from the sample 
of observations. This is not the case for classical approaches. In classical approaches, 
quality considerations are only viewed through mean properties of estimators or their 
asymptotic behaviour. By combining the two proposed methods we have in fact a point 
criterion. Another characteristics is that the proportion of truncation has any effect on 
the quality of estimations. The first method uses a well known method of minimum 
distance but with a new one which has an important advantage of being symmetric, the 
property of which many traditional distances do not have. However, the estimations are 
obtained in this case implicitly so it is difficult to find explicit expressions and study 
their properties to compare them with classical ones. Using the new distance we hope 
having fast convergent estimators since we expect that the influence of the errors in 
the frequencies will be slight in the new approach as we are using ratios of frequencies. 
Consider now the second method of the new approach. We use classical procedures of 
estimation such as the maximum likelihood principle using the auxiliary distribution. 
We may obtain the estimators and study their properties as commonly used and then 
preserves the advantages of classical methods. In classical approach, given a sample, 
the estimation of certain parameters such as the mean and variance do not change 
according to the family of parent distributions. The latter information is not used and 
this disadvantages the approach. However, in the new approach the estimations of the 
mean and variance change according to the distribution from which the data emanated. 
The following two examples show the effectiveness of using the auxiliary distribution. 

Example. Consider the following frequency table: 



Table 7. 



2 



3 



Total 




n 



f (Xj) = fj fl 



(wi/n) f 2 = (n 2 /n) 



1 
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Any sample of observations that satisfies the preceding frequency table may belong 
from one of the following distributions: 



9i (x) 



£ 
6 





i/xe{l,2,3}, 
otherwise, 



or g 2 (x) = 



2fi tfxe {2,3,4}, 
otherwise. 



The decision for determining which of the two distributions is more appropriate for table 
7, depends intuitively on the values n\ and n 2 (or f\ and f 2 ). However, if we use the 
classical maximum likelihood, we obtain that the samples of observations were generated 
from distribution h\ whatever the values of ri\ and n 2 , that is: 



x 



ft 2 



< 



ni 



X 



U2 



We will show by using the new approach that the decision is more relevant. Determine 
first the auxiliary distributions, h\ and hi, based on the truncation A = {2,3}, for g\ 
and g 2 respectively. We obtain 



2/5 if x = 2, 
h\(x) = ^ 3/5 if x = 3, 
otherwise, 



and h 2 (x) 



1/3 if x = 2, 
2/3 if x = 3, 
otherwise. 



By using the maximum likelihood for hi and h 2 , we have to decide according to the 
quantities (2/5)™ 1 x (3/5)" 2 and (1/3)™ 1 x (2/3)™ 2 . Solving the following inequality 



ni 



X 



W <- (I 



Til 



"2 



which is equivalent to (6/5) a (9/10) a < 1, where a = rii/n 2 , we obtain < a < 
— log(9/10)/log(4/3) = x ~ 0.36624. If < a < x , the data were generated from g 2 
and if x < a < 1, the data were generated from g 1 . We cannot make any decision about 
the case a = Xq. 

Example. Consider a binomial distribution with parameters n = 4 and p is unknown, 
from which we consider some samples of observations of size 15 given in table 8 by their 
absolute frequencies and chosen in order for having x — 8/15. 

Table 8. 





Values 






samples 





1 


2 


3 


4 


p 


p 


1 


7 


8 











0.133 


0.222 


2 


9 


5 





1 





0.133 


0.184 


3 


9 


4 


2 








0.133 


0.139 


4 


10 


3 


1 


1 





0.133 


0.134 


5 


10 


4 








1 


0.133 


0.216 


6 


12 





2 





1 


0.133 


0.196 


7 


13 











2 


0.133 


0.385 
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It is clear that the information given by the samples are not the same, nevertheless 
the classical estimation method gives us the same estimation p — 8/(15 x 4) « 0.133. If 
we use the second method of the new approach, we have to solve the following equation 
for each sample: 

x h(0,p) + 1 x h(l,p) + 2 x h(2,p) + 3 x h(3,p)+A x h(4,p) = x, 

where h(x,p) is the corresponding auxiliary distribution. The estimations given by the 
new method differ from sample to another as shown in the latest column of table 8, 
which is natural since each sample provides a different information about the parent 
distribution. We can also use the minimum of distance d v and we get also the same 
conclusion. 

6 Perspectives for the New Approach 
6.1 Model Selection From Truncated Data 

The fact that the distance d v is a metric allow to propose various applications of this new 
measure. We can use it for model selection amongst different probability families. We 
choose two or more possible candidate parametric families of distributions, and for each 
alternative family, estimate the parameters to select a specific candidate. Determine 
the distance between the specific candidate and the empirical distribution using the new 
metric d v . Finally, select the family which yields the minimum distance. In view of the 
new approach this can also be done in case of truncated data as opposed to classical 
approaches (see for example Cox [3], [1]), Taylor and Jakeman [16]) for model selection 
which can be used, from the best of our knowledge, only for complete data. 

To investigate this perspective thoroughly, samples of various sizes from known dis- 
tributions should be simulated, and the method for model selection applied, we can score 
the selection as correct or not after repeating the process a large number of times, the 
probability of correct selection could be estimated according to a given sample size. 

We can also use the new distance in cases where classical goodness of fit tests cannot 
reject two candidate families. We can choose the one which yields the minimum of 
distance d v . 

In the following examples, we shall select, in the first, between binomial distributions 
from truncated data. In the second example, we select between a Weibull and a Gamma 
distributions from right truncated data. 

Selection from Binomial distributions. We simulated 10000 samples of size 100 
from a Binomial distribution #(8,0.1) and each time we retained only the observations 
belonging from {0, 1, 2, 3} with their frequencies. Then we tried to identify the law sim- 
ulated starting from the corresponding table of frequencies. We used the distance d v to 
select between the original distribution of each simulated sample and the distribution 
#(10,0.15) and we score the selection as correct if the distance between the empirical 
distribution and the original one is less than with the alternative one #(15,0.15). The 
correct distribution was selected 98, 8%. Conversely, we simulated 10000 samples of size 
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100 from a Binomial distribution £>(10,0.15) and we select with #(8,0.1), the correct 
distribution was selected 99,43%. 

Selection between Weibull and Gamma distributions. We simulated 10000 sam- 
ples of size 1000 from the weibull distribution W(1.2, 1.5) and we truncated them on 
right by considering only observations above the cut-off 1.25. Each truncated sample was 
summarized into 11 classes. We selected between W(1.2, 1.5) and the Gamma distribu- 
tion G (2, 0.5) . The distance d v has selected the correct distribution, that is W(1.2, 1.5), 
98.16%. 

We can also find, before selecting between distribution, the best fit from the family 
of gamma distributions G (a, b) of the truncated data from a given probability density 
say W(1.2, 1.5). We have then to solve an optimization problem of finding the minimum 
of a function of two variables, min^f, d v (f, f) where / is the empirical distribution and 
/ = G (a,b), using well known methods such as Lavenberg-Marquardt using a computer 
algebra package. Also it should be better to choose the number of bins for each truncated 
sample by an optimal procedure, for example that of Birge and Rozenholc [2]. 

6.2 Estimation of the initial trial value in EM Algorithm 

The initial starting value is of great importance in convergence behaviour of algorithms 
such as EM Algorithm. Usually, as for the latter, the initial trial value is guessed. 
Surprisingly, we will show that our procedure gives an estimation of the starting value 
instead of having to guess. The approach will be illustrated by the following classical 
example which was the basis of the EM algorithm. 

Example of Hartley (1958) revisited. Hartley [7j used an algorithmic procedure to 
estimate the parameter of a Poisson distribution from data on the pollution of a sort of 
seeds by the presence of noxious weed seeds quoted from Snedecor [T5] and truncated 
them by missing the frequencies of the values and 1 as shown in the following table 9 
(Table 1 in Hartley [7J) 



Table 9. 



Values 


missing 1 
observed 2 


3 4 


5 


6 


7 9 


frequencies 


26 


16 18 


9 


3 


5 1 



Hartley [7J has guessed the frequencies of the missing values and 1 by taking n = 4 
and n\ = 14, and after 4 steps of his algorithmic procedure, which has been the basis 
of the well known EM algorithm for incomplete data (Dempster, Laird and Rubin [5]), 
has reached the estimation A = 3.026 (see table 1 p. 177 Hartley |7J). Using the second 
method, we get the estimation A2 = 3.1149. And by proportional allocation procedure 
we can see that the frequencies we get are n Q = 4.29 and n\ = 13.38 which are close to 
the guessed values. Using the distance d v we obtain the estimation Ai = 3.8447, and by 
removing the last value which has a small frequency n 7 = 1, we obtain a better result 
Ai = 3.4441, which are also appreciable as starting values since in practice the true 
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parameter is unknown. 

Initial trial value for mixture Normal Populations. We shall present an applica- 
tion of the previous method used for truncated data in the situation where we have a 
mixture population of two normal distributions. In classical methods, we use the merged 
distribution / = af\ + (1 — a) f 2 and we estimate the parameters a, mi and m 2 using 
for example the EM algorithm which is based on maximizing the complete likelihood of 
the merged distribution by an algorithmic procedure from a guessed initial trial value. 
However, the problem of occurrence of several local maxima is well-known for the setting 
of EM algorithm. Also, Seidel, Mosler and Alker [H] pointed out that the likelihood- 
ratio test in mixture models depends on the choice of the initial trial value for the EM 
algorithm. If the initial trial value is close to the true value it is clear that the algorithm 
will converge in few steps to the true local maximum. We will show that using the new 
approach we get an accurate estimated initial trial value. 

Assume we have a merged sample from two samples of observations of sizes rii and 
n 2 from two normal distributions f\ = N(jni,ax) and f 2 = N(m 2 , cr 2 ), with mi 7^ m 2 . 
By assuming that ai and a 2 are known, our aim is to estimate the means m\ and m 2 , 
and also the merging proportion a of each population. 

We will use a method based on truncations. The main idea being to split the range 
of the merged sample into three suitably chosen parts. A central part where the ob- 
servations are highly merged, a left and right truncated parts where the observations 
become mainly from one of the distributions considered. If for example mi < m 2 , then 
to estimate mi we have to use the chosen right truncated part (left truncation A). 

The procedure is summarized as follows: 

1. We compute the sample mean m g of the merged observations. 

2. For determining the location of the two means m x and m 2 , we compute the 
empirical standard deviation Si of the observations less than m g , and S r for those that 
are greater. Assume that Si < S r , in this case if 01 < a 2 then we deduce that mi is 
situated on the left of m g . Otherwise, it will be assumed to be on its right. We follow 
the same idea for the case Si > S r . If <Ji = o 2 we pass directly to the third step. 

3. Assume that m x is on the left. It is well known that for a normal distribution 
N (m, 0) we have P(]m — 0, m + a[) ~ 0.68. We hope that on the left of sup z = m g — o 2 
the number of observations generated from N(m 2 ,a 2 ) is negligible, and on the right of 
min r = m g + 01 the number of observations generated from N(mi, <Ji) is also negligible. 
Hence, to estimate mi, we consider only the part of observations situated on the left of 
m g ~ °"2, and to estimate m 2 we consider the part situated on the right of m g + Oi. 

The following example will provide some feel for the accuracy of the procedure. 
Example. We consider the case where o\ — a 2 - consider two samples of observations 
generated from N(mi,ai) and N(m 2 ,a 2 ), where mi = 1.3 and m 2 = 2.4, with known 
en = 02 = 1 and sizes n\ = 300 and n 2 = 200. We combine them to obtain a merged 
sample of size n = 500. We have chosen the distributions in such a way that the histogram 
(Fig.l) of the merged sample does not show directly the existence of a mixture of two 
distributions. When the histogram of the merged population is bimodal the situation 
is more easier, since when taking a suitably left (or right) part we get more accurate 
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estimation from the situation that this part will have a negligible number of observations 
from the second distribution. 

80 I 1 1 1 1 1 1 1 



70 - 



60 - 



50 - 



40 - 



30 - 



20 - 



10 - 




-2 -1 1 2 3 4 5 



Fig 1. Merged histogram of two normal distributions iV(1.3, 1) and N(2A, 1). 

It should be stressed that the histogram is one modal and does not show at first glance 
any mixture situation. Following the steps of the procedure we begin by calculating the 
mean of the resulting merged sample and we obtain m g = 1.8046. Since the standard 
deviations are assumed to be equal then we compute directly sup; = m g —02 = 0.8046. By 
grouping the observations on the left of sup ; (which constitute the chosen right truncated 
part) in 7 classes we obtain the following table: 



Table 10. 



Ui -1.5589 


-1.1294 


-0.6998 


-0.2703 


-0.1593 


0.5888 


rii 1 


3 


6 


17 


24 


41 



Using the distance d v we obtain for all the truncation m\ v> = 1.244 and by deleting 
u\ we get the value mf^ = 1.2516. 

The sample mean of the observations on the left of sup z is given by ui = 0.1483. 
Using the second method we have to solve on m the following formula 



u\ x exp 


— [u\ — m) 2 


+ w 2 x exp 




+ ... + Uk x exp 


-(u k -m) 2 


[ 2a 2 


L 2a ' 2 




exp 


— (tti— m) 2 


+ exp 


— (u2~m) 2 


+ ... + exp 


-(u fe -m) 2 




2a 2 


2a 2 


2a 2 



= ui. (17) 



we obtain the estimation rh\ = 1.2646. By deleting the first value U\ which has a weak 
frequency n\ = 1, that is using the truncation A = {u 2 , u 3 , tt 4 , u 5 , u e } , (we compute 
again ui = 0.1734) we obtain a better estimation rh\ = 1.3011, which is very close to the 
true value mi = 1.3. 
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To estimate m 2 , we consider the part situated on the right of min r = m g +<Ji = 2.8046. 
Grouping the observations on the right of inf^ (which constitute the chosen right part) 
in 7 classes we obtain the following table: 



Table 11. 



Ui 2.979 


3.316 


3.653 


3.990 


4.326 


4.663 


rii 38 


25 


15 


9 


7 


3 



Using the distance d v for all the truncation we get mi, = 2.397. The sample mean 
of the observations on the right part is given by Ud = 3.523. Using formula (ITTj) with 
Ud, we obtain the result m 2 = 2.245. Deleting the extreme values U\ and u§ we obtain 
m 2 = 2.412. 

The mixture proportion a can easily be estimated using the formula axm^fl — a) x 
m 2 = m g . 

Considering the estimations obtained, which are close to the true values of mi and 
m 2 , it is clear that the EM algorithm will converge fastly to the unique solutions. 



6.3 Test of Goodness of Fit Based on the New Distance 

We can obtain empirical quantile estimations of d v using Montecarlo or Bootstrapping 
technics, and use them in a test of goodness of fit for a specified probability distribution. 
We simulate N samples of the same size from the specified probability distribution and 
calculate the distances d^\ ...,di N \ We can then estimate the asymptotic distribution 
of d v by 

FM = (18) 

Consequently, for a sample of the same size we compute dl? bs ^ and we reject the hypothesis 
that it belongs from the specified distribution if Fa v {d[ obs ^) > (1 — a) for a given level of 
significance a. 

The values di , di N ^ may be obtained from the empirical distribution function F n 
of the sample. 



6.4 Quality of Data 

The fact that the new measure d v is not equivalent to classical ones means that it treats 
other aspects not investigated by the latter. This may open new perspectives such as 
making decision about the accuracy of an estimation in cases where the classical and 
new estimations are close to each others. In cases where the classical estimation and 
the new one using d v are significantly different then we can say that the sample of 
observations considered does not restore coherently all necessary information about the 
parent distribution from which it emanated. 
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7 Concluding Remarks 

In the foregoing study, we have presented a new statistical point estimation method 
which found be useful in truncated and grouped and censored data situations. A new 
distance between probability distributions was introduced. It measures the difference 
between the variations of two given probability distributions. We introduced an auxiliary 
distribution based on a truncation, from a chosen family of probability distributions. 
This new distribution will have the same parameters to estimate as the parent one. We 
use then statistical methods to estimate the parameters of the random variable under 
study using the empirical and new auxiliary distribution in the region that captures the 
data, from which we determine the corresponding parent distribution. The later is the 
estimation by the new method. Using the new distance introduced we also estimate by 
the minimum distance approach and use the resulting estimation as a control on the 
accuracy of estimation obtained by the former method. We have obtained a result which 
states that if we have to estimate the parameter of a probability distribution from the 
one parameter exponential family, then it suffices to have two points with exact ratio of 
frequencies, that is equal to the theoretical one expressed by the ratio of the value of the 
probability distribution on these two points, to obtain the true value of the parameter. 
We have conjectured that if we have in general r parameters, then it suffices to have r + 1 
points with exact ratios of their frequencies to obtain the r true parameters exactly. The 
later result need to be proved rigorously in a general setting for other distributions than 
the class considered. A large comparative study between the classical and new methods 
should also be investigated. We presented some perspectives of the new approach such 
as model selection from truncated data using the new distance, estimation of the first 
trial value in the celebrate EM algorithm in the case of truncation and for mixture of 
two normal populations, a test of goodness of fit based on the new distance, decision 
making about the quality of estimations and data. 

References 

[1] Amemiya, T. (1985). Advanced Econometrics. Cambridge: Harvard University 
Press. 

[2] Birge, L. and Rozenholc,Y. (2006) How many bins should be put in a regular his- 
togram. ESAIM: Probability and Statistics, Vol. 10, p. 24-45. 

[3] Cox, D. R. (1961). Tests of separate families of hypotheses. In Proceedings of the 
Fourth Berkeley Symposium, Vol. 1, 105-123. Berkeley: University of California 
Press. 

[4] Cox, D. R. (1962). Further results on tests of separate families of hypotheses. J. R. 
Statist. Soc. B n° 24, pp 406-424. 

[5] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from 
incomplete data via the EM algorithm, J. R. Statist. Soc. B n° 1, pp 1-38. 



24 



[6] Efron, B. and Petrosian, V. (1999). Nonparametric methods for doubly truncated 
data, J. Am. Stat. Assoc. Vol. 94. No. 447. pp. 824-834. 

[7] Hartley, H. O. (1958). Maximum likelihood estimation from incomplete data, Bio- 
metrics, June , pp 174-194. 

[8] Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete 
observations. J. Am. Stat. Assoc. Vol. 53. pp. 457-481. 

[9] Klein, J. P. and Zhang, M. J. (1996). Statistical challenges in comparing chemother- 
apy and bone marrow transplantation as a treatment for leukemia, Lifetime Data: 
Models in Reliability and Survival Analysis, N.P. Jewel, 175-185. 

[10] Lehmann, E. L. & Casella, G. (1998). Theory of point estimation. Springer, New- 
York. 

[11] Lynden-Bell, D. (1971). A method of allowing for known observational selection in 
small samples applied to 3CR quasars. Mon. Not. R. Astr. Soc. Vol.155, pp. 95-118. 

[12] Parzen, E. (1962). On estimation of a probability density function and mode. Ann. 
Math. Stat, 1065-1076. 

[13] Shaw, D. (1988). On-Site samples regression problems of nonnegative integers, trun- 
cation, and endogenous stratification. Journal of Econometrics, 37, pp. 211-223. 

[14] Seidel, W., Mosler, K., and Alker, M. (2000). A cautionary note on likelihood ratio 
tests in mixture models. Ann. 1st. Stat. Math., 52, 481-487. 

[15] Snedecor, G. W. (1956). Statistical Methods. 5th ed., The Iowa State College Press, 
Ames, Iowa. 

[16] Taylor, J. A. and Jakeman, A. J. (1985). Identification of a distributional model. 
Commun. Statist.- Simula. Computa., 14(2), 497-508. 



25 



